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Audio encoding 



FIELD OF THE INVENTION 

The present invention relates to encoding and decoding of broadband signals, 
in particular audio signals. The invention relates both to the encoder and the decoder, and to 
an audio stream encoded according to the invention and a data storage medium on which 
such an audio stream has been stored. 

BACKGROUND OF THE INVENTION 

When transmitting broadband signals, e.g. audio signals such as speech, 
compression or encoding techniques are used to reduce the bandwidth or bit rate of the 
signal. 

Figure 1 shows a known parametric encoding scheme, in particular a 
sinusoidal encoder, which is used in the present invention, and which is described in WO 
01/69593 and European Patent Application 02080002.5 (PHNL021216). In this encoder, an 
input audio signal x(t) is split into several. (possibly overlapping) time segments or frames, 
typically of duration 20 ms each. Each segment is decomposed into transient, sinusoidal and 
noise components. It is also possible to derive other components of the input audio signal 
such as harmonic complexes, although these are not relevant for the purposes of the present 
invention. 

In the sinusoidal analyser 130 of Figure 1 the signal x2 for each segment is 
modelled using a number of sinusoids represented by amplitude, frequency and phase 
parameters. This information is usually extracted for an analysis time interval by performing 
a Fourier transform (FT) which provides a spectral representation of the interval including: 
frequencies, amplitudes for each frequency, and phases for each frequency, where each phase 
is "wrapped", i.e. in the range {-%;%}. Once the sinusoidal information for a segment is 
estimated, a tracking algorithm is initiated. This algorithm uses a cost function to link 
sinusoids in different segments with each other on a segment-to-segment basis to obtain so- 
called tracks. The tracking algorithm thus results in sinusoidal codes Cs comprising 
sinusoidal tracks that start at a specific time instance, evolve for a certain duration of time 
over a plurality of time segments and then stop. 
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In such sinusoidal encoding, it is usual to transmit frequency information for 
the tracks formed in the encoder. This can be done in a simple manner and with relatively 
low costs, since tracks only have slowly varying frequency. Frequency information can 
therefore be transmitted efficiently by time differential encoding. In general, amplitude can 
5 also be encoded differentially over time. 

In contrast to frequency, phase changes more rapidly with time. If the 
frequency is (substantially) constant, the phase will change (substantially) linearly with time, 
and frequency changes will result in corresponding phase deviations from the linear course. 
As a function of the track segment index, phase will have an approximately linear behaviour. 
10 Transmission of encoded phase is therefore more complicated. However, when transmitted, 
phase is limited to the range {-n;%}, i.e. the phase is "wrapped", as provided by the Fourier 
transform. Because of this modulo 2tc representation of phase, the structural inter-frame 
relation of the phase is lost and, at first sight appears to be a random variable. 

However, since the phase is the integral of the frequency, the phase is 
1 5 redundant and needs, in principle, not be transmitted. This reduces the bit rate significantly. 
In the decoder the phase is recovered by a process which is called phase continuation. 

In phase continuation, only the encoded frequency is transmitted, and the 
phase is recovered at the decoder from the frequency data by exploiting the integral relation 
between phase and frequency. It is known, however, that when phase continuation is used, 
20 the phase cannot be perfectly recovered. If frequency errors occur, e.g. due to measurement 
errors in the frequency or due to quantisation noise, the phase, being reconstructed using the 
integral relation, will typically show an error having the character of drift. This is because 
frequency errors have an approximately random character. Low-frequency errors are 
amplified by integration, and consequently the recovered phase will tend to drift away from 
25 the actually measured phase. This leads to audible artefacts. 

This is illustrated in Figure 2a where £2 and \\f are the real frequency and real 
phase, respectively, for a track. In both the encoder and decoder frequency and phase have an 
integral relationship as represented by the letter "I". The quantisation process in the encoder 
is modelled as an added noise n. In the decoder, the recovered phase \|? thus includes two 
30 components: the real phase \|/ and a noise component £2, where both the spectrum of the 
recovered phase and the power spectral density function of the noise £2 have a pronounced 
low-frequency character. 
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3 13.10.2003 
Thus, it can be seen that in phase continuation, since the recovered phase is the 
integral of a low-frequency signal, the recovered phase is a low-frequency signal itself. 
However, the noise introduced in the reconstruction process is also dominant in this low- 
frequency range. It is therefore difficult to separate these sources with a view to filtering the 
noise n introduced during encoding. 

Further, in phase continuation, for each track only the first sinusoid of each 
track is transmitted in order to save bit rate. Each subsequent phase is calculated from the 
initial phase and frequencies of the track. Since the frequencies are quantised and not always 
very accurately estimated, the continuous phase will deviate from the measured phase. 
Experiments show that phase continuation degrades the quality of an audio signal. 

European Patent Application 02080002.5 (PHNL021216) addresses these 
problems by proposing a joint frequency/phase quantiser, where the measured phases of a 
sinusoidal track, which have values between -7t and 7C, are unwrapped using the measured 
frequencies and linking information, resulting in monotonic increasing unwrapped phases 
along a track. In the encoder, the unwrapped phases are quantised using an Adaptive 
Differential Pulse Code Modulation (ADPCM) quantiser and transmitted to the decoder. The 
decoder derives the frequencies and the phases of a sinusoidal track from the unwrapped 
phase trajectory. 

As an example, the ADPCM quantiser can be configured as described below. 
For the first continuation of a track, the unwrapped phase is quantised according to Table 1. 



Representation level r 


Representation table R 


Level type 


0 


-3.0 


Outer level 


1 


-0.75 


Inner level 


2 


0.75 


Inner level 


3 


3.0 


Outer level 



Table 1 : Representation table R used for first continuation. 

The quantisation boundaries are defined according to this table by: {-©°; 2T 
(r=l), 0, 2T (r=2), <*>}. For each consecutive continuation, the tables are scaled. If the 
representation level is in the outer level, the tables are multiplied by 2 m 9 making the 
quantisation accuracy coarser. Otherwise, the representation levels are in the inner level and 
the tables are scaled by 2~ 1/4 , making the quantisation accuracy finer. Furthermore, there is an 
upper and lower bound to the inner level, namely 3tc/4 and 7t/64. 
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The quantisation of the unwrapped phase trajectory is a continuous process in 
the above methods, where the quantisation accuracy is adapted along the track Therefore, in 
order to decode a track, the decoding process has to start from the birth or starting point of a 
track, i.e. the decoder can only de-quantise a complete track and it is not possible to decode a 
5 part of the track. Therefore, special methods enabling random-access have to be added to the 
encoder and decoder. Random-access may e.g. be used to 'skip' or 'fast forward' in an audio 
signal 

A first straightforward way to perform random access is to define random- 
access frames (or refresh points) in the encoder/quantiser and re-start the ADPCM quantiser 

10 in the decoder at these random-access frames. For the random-access frame, the initial tables 
are used. This way requires no extra bit. However, a drawback of this approach is that the 
quantisation tables and thus the quantisation accuracy have to be adapted again from the 
random-access frame and onwards. Therefore, initially, the quantisation accuracy might be 
too coarse, resulting in a discontinuity in the track, or too fine, resulting in large quantisation 

1 5 errors. This leads to a degradation of the audio quality compared to the decoded signals 
without the use of random-access frames. 

A second straightforward way is to transmit all states of the ADPCM quantiser 
(that is the quantisation accuracy and the memories in the predictor as mentioned in European 
Patent Application 02080002.5 (PHNL021216). The quantiser will then have similar output 

20 with or without random-access frames. In this way, the sound quality will hardly suffer. 
However, the additional bit rate to transmit all this information will be considerable. 
Especially since the contents of the memories of the predictor have to be quantised according 
to the quantisation accuracy of the ADPCM quantiser. 

The present invention addresses these problems. 

25 

SUMMARY OF THE INVENTION 

The present invention provides a method of encoding a broadband signal, in 
particular an audio signal or a speech signal, using a low bit-rate. More specifically, the 
invention provides a method of encoding an audio signal, the method comprising the steps of: 
30 providing a respective set of sampled signal values for each of a plurality of sequential time 
segments; analysing the sampled signal values to determine one or more sinusoidal 
components for each of the plurality of sequential segments; linking sinusoidal components 
across a plurality of sequential segments to provide sinusoidal tracks, each track comprising a 
number of frames; and generating an encoded signal including sinusoidal codes comprising a 
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representation level for zero or more frames and where some of these codes comprise a 
phase, a frequency and a quantisation table for a given frame when the given frame is 
designated as a random-access frame. 

In this way, random-access is enabled, e.g. allowing for slopping through a 
5 track, etc., while avoiding the long adaptation of the quantisation accuracy in a quantiser, e.g. 
an ADPCM quantiser, of the prior art, as (some) of the quantisation state is transmitted (in 
the form of the quantisation table) to the encoder. 

Further, the quantisation table is adapted fester compared to the first 
straightforward method that uses the default initial table. Additionally, compared to the 
1 0 second straightforward method the present invention results in a lower bit rate. 

The present invention offers a good compromise between the two 
(straightforward) methods, by transmitting only the quantisation accuracy thereby providing 
good quality at a low bit rate. 

In a preferred embodiment, each quantisation table is represented by an index 
1 5 where the index is transmitted from the encoder to the decoder at a random-access frame 
instead of the quantisation table. The index may e.g. be generated or represented using 
Huffman coding. 

Preferably, the phase ((j) ) and the frequency (a>) for a random-access frame is 
the initial phase (c|> (0)) and the initial frequency (go (0)). 

20 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows a prior art audio encoder in which an embodiment of the 
invention is implemented; 

Figure 2a illustrates the relationship between phase and frequency in prior art 

25 systems; 

Figure 2b illustrates the relationship between phase and frequency in audio 
systems using phase encoding; 

Figures 3a and 3b show a preferred embodiment of a sinusoidal encoder 
component of the audio encoder of Figure 1 according to the present invention; 
30 Figure 4 shows an audio player in which an embodiment of the invention is 

implemented; and 

Figures 5a and 5b show a preferred embodiment of a sinusoidal synthesizer 
component of the audio player of Figure 4 according to the present invention; 
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6 13.10.2003 
Figure 6 shows a system comprising an audio encoder and an audio player 

according to the invention; and 

Figures 7a and 7b illustrate the infonnation sent from the encoder and received 

at the decoder according to prior art and to the present invention, respectively. 

5 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

Preferred embodiments of the invention will now be described with reference 
to the accompanying drawings wherein like components have been accorded like reference 
numerals and, unless otherwise stated, perform like functions. 

10 Figure 1 shows a prior art audio encoder 1 in which an embodiment of the 

invention is implemented. In a preferred embodiment of the present invention, the encoder 1 
is a sinusoidal encoder of the type described in WO 01/69593, Figure 1 and European Patent 
Application 02080002.5 (PHNL021216), Figure 1. The operation of this prior art encoder 
and its corresponding decoder has been well described and description is only provided here 

1 5 where relevant to the present invention. 

In both the prior art and the preferred embodiment of the present invention, the 
audio encoder 1 samples an input audio signal at a certain sampling frequency resulting in a 
digital representation x(t) of the audio signal. The encoder 1 then separates the sampled input 
signal into three components: transient signal components, sustained deterministic 

20 components, and sustained stochastic components. The audio encoder 1 comprises a transient 
encoder 11, a sinusoidal encoder 13 and a noise encoder (NA) 14. 

The transient encoder 1 1 comprises a transient detector (TD) 1 10, a transient 
analyser (TA) 111 and a transient synthesizer (TS) 112. First, the signal x(t) enters the 
transient detector 110. This detector 110 estimates if there is a transient signal component 

25 and its position. This information is fed to the transient analyser (TA) 1 1 1 . If the position of a 
transient signal component is determined, the transient analyser (TA) 111 tries to extract (the 
main part of) the transient signal component It matches a shape function to a signal segment 
preferably starting at an estimated start position, and determines content underneath the shape 
function, by employing for example a (small) number of sinusoidal components. This 

30 information is contained in the transient code C T , and more detailed information on 
generating the transient code Cr is provided in WO 01/69593. 

The transient code Or is furnished to the transient synthesizer (TS) 112. The 
synthesized transient signal component is subtracted from the input signal x(t) in subtracter 
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16, resulting in a signal xl . A gain control mechanism GC (12) is used to produce x2 from 
xl. 

The signal x2 is furnished to the sinusoidal encoder 13 where it is analysed in 
a sinusoidal analyser (SA) 130, which determines the (deterministic) sinusoidal components. 
5 It will therefore be seen that while the presence of the transient analyser is desirable, it is not 
necessary and the invention can be implemented without such an analyser. Alternatively, as 
mentioned above, the invention can also be implemented with for example a harmonic 
complex analyser. In brief, the sinusoidal encoder encodes the input signal x2 as tracks of 
sinusoidal components linked from one frame segment to the next 

10 Referring now to Figure 3a, in the same manner as in the prior art, in the 

preferred embodiment, each segment of the input signal x2 is transformed into the frequency 
domain in a Fourier transform (FT) unit 40. For each segment, the FT unit provides measured 
amplitudes A, phases <j> and frequencies ox As mentioned previously, the range of phases 
provided by the Fourier transform is restricted to -rc < (J>< n. A tracking algorithm (TRA) unit 

15 42 takes the information for each segment and by employing a suitable cost function, links 
sinusoids from one segment to the next, so producing a sequence of measured phases (])(k) 
and frequencies co(k) for each track. 

The sinusoidal codes Cs ultimately produced by the analyser 130 include 
phase information, and frequency is reconstructed from this information in the decoder, as is 

20 mentioned in European Patent Application 02080002.5 (PHNL021216). According to the 
present invention a quantisation table (Q) or preferably an index (IND) representing the 
quantisation table (Q) is produced by the analyser 130 instead of a representation level r 
when the given sub-frame being processed is a random-access frame, as will be explained in 
greater detail in connection with Fig. 3b. 

25 As mentioned above, however, the measured phase <|>(k) is wrapped, which 

means that it is restricted to a modulo 2% representation. Therefore, in the preferred 
embodiment, the analyser comprises a phase unwrapper (PU) 44 where the modulo 2% phase 
representation is unwrapped to expose the structural inter-frame phase behaviour \|/ for a 
track. As the frequency in sinusoidal tracks is nearly constant, it will be seen that the 

30 unwrapped phase \|/ will typically be a nearly linearly increasing (or decreasing) function and 
this makes cheap transmission of phase, i.e. with low bit rate, possible. The unwrapped phase 
\|/ is provided as input to a phase encoder (PE) 46, which provides as output quantised 
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representation levels r suitable for being transmitted (when a given sub-frame is not a 
random-access frame). 

Referring now to the operation of the phase unwrapper 44, as mentioned 
above, instantaneous phase \|/ and instantaneous frequency Q. for a track are related by: 

5 V (o=£ o n(u)rfr+ V (r 0 ) (i) 

where T 0 is a reference time instant 

A sinusoidal track in frames k = K, K+l ... K+L-l has measured frequencies 
co(k) (expressed in radians per second) and measured phases <J>(k) (expressed in radians). The 
distance between the centres of the frames is given by U (update rate expressed in seconds). 

10 The measured frequencies are supposed to be samples of the assumed underlying continuous- 
time frequency track SI with a>(k) = £i(kU) and, similarly, the measured phases are samples 
of the associated continuous-time phase track \j/ with <p(k) = \j/(kU) mod (2%). For sinusoidal 
encoding it is assumed that CI is a nearly constant function. 

Assuming that the frequencies are nearly constant within a segment Equation 1 

1 5 can be approximated as follows: 

V (kU) = Cl{t)dt + V ((* - 1)U) (2) 
« {co(k) + co(fc - l)}t/ / 2 +\|/ ((* - 1)U) 

It will therefore be seen that knowing the phase and frequency for a given 
segment and the frequency of the next segment, it is possible to estimate an unwrapped phase 
value for the next segment, and so on for each segment in a track. 
20 In the preferred embodiment, the phase unwrapper determines an unwrap 

factor m(k) at time instant k: 

\|/ (Art/) =(K#) + m(k)2% (3) 
The unwrap factor m(k) tells the phase unwrapper 44 the number of cycles 
which has to be added to obtain the unwrapped phase. 
25 Combining equations 2 and 3, the phase unwrapper determines an incremental 

unwrap factor e(k) as follows: 

2xe(k) = 2%{m(k) - m(k - 1)} = {©(£) +o&(k -l)}f//2 - {$(*) H>(* - 1)} 
where e should be an integer. However, due to measurement and model errors, the 
incremental unwrap factor will not be an integer exactly, so: 
30 eik) = round&{<o(k) + co(lc - l)p/2-§(k) -$(k - l)}]/(2rc )) 

assuming that the model and measurement errors are small. 
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Having the incremental unwrap fector e, the m(k) from equation (3) is 
calculated as the cumulative sum where, without loss of generality, the phase unwrapper 
starts in the first frame K with m(K) = 0, and from m(k) and <)>(k), the (unwrapped) phase 
\|/ (kU) is determined 
5 In practice, the sampled data \|/ (Art/) and Q,(kU) are distorted by 

measurement errors: 

where 8i and £2 are the phase and frequency errors, respectively. In order to prevent the 
10 determination of the unwrap fector becoming ambiguous, the measurement data needs to be 
determined with sufficient accuracy. Thus, in the preferred embodiment, tracking is restricted 
so that: 

8 (£) = e(k) - [{©(£) + co(£ - l)}t//2 - {<}>(£) -§(k- 1)}]/(2tc ) < 5 0 
where 8 is the error in the rounding operation. The error 8 is mainly determined by the errors 
15 in G) due to the multiplication with U. Assume that 0) is determined from the maxima of the 
absolute value of the Fourier transform from a sampled version of the input signal with 
sampling frequency F s and that the resolution of the Fourier transform is 2%/La with L a the 
analysis size. In order to be within the considered bound, we have: 

^ = 8, 
u 0 

20 That means that the analysis size should be few times larger than the update 

size in order for unwrapping to be accurate, e.g., setting 8 0 = 1/4, the analysis size should be 
four times the update size (neglecting the errors £1 in the phase measurement). 

The second precaution, which can be taken to avoid decision errors in the 
round operation, is to defining tracks appropriately. In the tracking unit 42, sinusoidal tracks 

25 are typically defined by considering amplitude and frequency differences. Additionally, it is 
also possible to account for phase information in the linking criterion. For instance, we can 
define the phase prediction error 8 as the difference between the measured value and the 
predicted value (j> according to 

e =k*)-f(*)}mod27c 
30 where the predicted value can be taken as 

♦ (*) = ♦ (* - 1) + {<*>(*) - - l)P 1 2 
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Thus, preferably the tracking unit (TRA) 42 forbids tracks where 8 is larger 
than a certain value (e.g. 6 > 7t/2), resulting in an unambiguous definition of e(k). 

Additionally, the encoder may calculate the phases and frequencies such as 
will be available in the decoder. If the phases or frequencies which will become available in 
5 the decoder differ too much from the phases and/or frequencies such as are present in the 
encoder, it may be decided to interrupt a track, i.e. to signal the end of a track and start a new 
one using the current frequency and phase and their linked sinusoidal data. 

The sampled unwrapped phase \|/(kU) produced by the phase unwrapper (PU) 
44 is provided as input to phase encoder (PE) 46 to produce the set of representation levels r 
10 (or according to the present invention, a quantisation table (Q) or an index (TND) 

representing the quantisation table (Q) when the given sub-frame being processed/transmitted 
is a random-access frame). Techniques for efficient transmission of a generally 
monotonically changing characteristic such as the unwrapped phase are known. 

Figure 3b illustrates a preferred embodiment of the phase encoder (PE) 46. In 
1 5 this preferred embodiment, Adaptive Differential Pulse Code Modulation (ADPCM) is 

employed. Here, a predictor (PF) 48 is used to estimate the phase of the next track segment 
and encode the difference only in a quantiser (QT) 50. Since \|/ is expected to be a nearly 
linear function and for reasons of simplicity, the predictor 48 is chosen as a second-order 
filter of the form: 
20 y(k + 1) = 2x(k) - x(k - 1) 

where x is the input and y is the output It will be seen, however, that it is also possible to 
take other functional relations (including higher-order relations) and to include (backward or 
forward) adaptation of the filter coefficients. In the preferred embodiment, a backward 
adaptive control mechanism (QC) 52 is used for simplicity to control the quantiser (QT) 50. 
25 Forward adaptive control is possible as well but would require extra bit rate. 

As will be seen, initialisation of the encoder (and decoder) for a track starts 
with knowledge of the start phase (|>(0) and frequency co(0). These are quantised and 
transmitted by a separate mechanism. Additionally, the initial quantisation step used in the 
quantisation controller (QC) 52 of the encoder and the corresponding controller 62 in the 
30 decoder, Figure 5b, is either transmitted or set to a certain value in both encoder and decoder. 
Finally, the end of a track can either be signalled in a separate side stream or as a unique 
symbol in the bit stream of the phases. 
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The start frequency of the unwrapped phase is known, both in the encoder and 
in the decoder. On basis of this frequency, the quantisation accuracy is chosen. For the 
unwrapped phase trajectories beginning with a low frequency, a more accurate quantisation 
grid, Le. a higher resolution, is chosen than for an unwrapped phase trajectory beginning with 
a higher frequency. 

In the ADPCM quantiser, the unwrapped phase \\f (k) , where k represents the 
number in the track, is predicted/estimated from the preceding phases in the track. The 
difference between the predicted phase \j7 (k) and the unwrapped phase \\f (k) is then 
quantised and transmitted. The quantiser is adapted for every unwrapped phase in the track. 
When the prediction error is small, the quantiser limits the range of possible values and the 
quantisation can become more accurate. On the other hand, when the prediction error is large, 
the quantiser uses a coarser quantisation. 

The quantiser Q in Figure 3b quantises the prediction error A, which is 

calculated by 

A(*)=x|/(*)-v(*) 

The prediction error A can be quantised using a look-up table. For this purpose, a table Q is 
maintained. For example, for a 2-bit ADPCM quantiser, the initial table for Q may look like 
the table shown in Table 2. 



Index 
/ 


Lower boundaries 
bl 


Upper boundary bu 


0 


— OO 


-3.0 


1 


-3.0 


0 


2 


0 


3.0 


3 


3.0 


OO 



Table 2: Quantisation table Q used for first continuation. 

The quantisation is done as follows. The prediction error A is compared to the 
boundaries b, such that the following equation is satisfied: 
bl i <A<bu { 

From the value of i, that satisfies the above relation, the representation level r is computed by 
r = /. 

The associated representation levels are stored in representation table R, which is shown in 
Table 3. 
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Representation 


Representation 


Level type 


level r 


table R 




0 


-3.0 


Outer level 


1 


-0.75 


Inner level 


2 


0.75 


Inner level 


3 


3.0 


Outer level 



Table 3: Representation table R used for first continuation 

The entries of tables Q and R are multiplied by a factor c for the quantisation 
of the next sinusoidal component in the track. 

g(*+i) = G(*)-c 

5 R(k+l) = R(k)c 

During the decoding of a track, both tables are scaled according to the generated 
representation levels r. If r is either 1 or 2 (inner level) for the current sub-frame, then the 
scale factor c for the quantisation table is set to 

10 Since c < 1, the frequency and phase of the next sinusoid in a track becomes more accurate. If 
r is 0 or 3 (outer level), the scale fector is set to 
c = 2 1/2 

Since c > 1, the quantisation accuracy for the next sinusoid in a track decreases. Using these 
factors, one up-scaling can be made undone by two down-scalings. The difference in upscale 
15 and downscale factors results in a fast onset of an up-scaling, whereas a corresponding 
downscaling requires two steps. 

In order to avoid very small or very large entries in the quantisation table, the 
adaptation is only done if the absolute value of the inner level is between and 3tc/4. In 
that case c is set to 1. 

20 In the decoder only table R has to be maintained to convert to received 

representation levels r to a quantised prediction error. This de-quantisation operation is 
performed by block (DQ) 60 in Figure 5b. 

Using the above settings, the quality of the reconstructed sound needs 
improvement In accordance with the invention, different initial tables for unwrapped phase 

25 tracks, depending on the start frequency, are used Hereby a better sound quality is obtained. 
This is done as follows. The initial tables Q and R are scaled on basis a first frequency of the 
track In Table 4, the scale factors are given together with the frequency ranges. If the first 
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frequency of a track lies in a certain frequency range, the appropriate scale factor is selected, 
and the tables R and Q are divided by that scale factor. The end-points can also depend on the 
first frequency of the track. In the decoder, a corresponding procedure is performed in order 
to start with the correct initial table R- 



Frequency range 


Scale factor 


Initial table Q 


Initial table R 


0 - 500 Hz 


8 


-°o -0.19 0 0.19 oo 


-0.38 -0.09 0.09 0.38 


500- 1000 Hz 


4 


-oo -0.37 0 0.37 oo 


-0.75 -0.19 0.19 0.75 


1000 - 4000 Hz 


2 


-oo -0.75 0 0.75 oo 


-1.5 -0.38 0.38 1.5 


4000 -22050 Hz 


1 


-oo -1.5 0 1.5 oo 


-3 -0.75 0.75 3 


Table 4: Frequency d 


ependent scale factors and initial tables 



Table 4 shows an example of frequency dependent scale factors and 
corresponding initial tables Q and R for a 2-bit ADPCM quantiser. The audio frequency 
range 0-22050 Hz is divided into four frequency sub-ranges. It is seen that the phase accuracy 
is improved in the lower frequency ranges relative to the higher frequency ranges. 
10 The number of frequency sub-ranges and the frequency dependent scale 

factors may vary and can be chosen to fit the individual purpose and requirements. Like 
described above, the frequency dependent initial tables Q and R in table 4 may be up-scaled 
and down-scaled dynamically to adapt to the evolution in phase from one time segment to the 
next. 

15 In e.g. a 3-bit ADPCM quantiser, the initial boundaries of the eight 

quantisation intervals defined by the 3 bits can be defined as follows: 

Q = {-oo -1.41 -0.707 -0.35 0 0.35 0.707 1.41 «>}, and can have minimum grid size rc/64, and 
a maximum grid size 7C/2. The representation table R may look like: 

R = { -2.117, -1.0585, -0.5285, -0.1750, 0.1750, 0.5285, 1.0585, 2.117}. A similar frequency 
20 dependent initialisation of the table Q and R as shown in Table 4 may be used in this case. 

So far the process has been as described in Europen Patent Application 
02080002.5 (PHNL021216). 

According to the present invention, quantiser (QT) 50, predictor (PF) 48 and 
backward adaptive control mechanism (QC) 52 may further receive a (external) trigger signal 
25 (Trig.) indicating that the given frame being processed is a random-access frame. When no 
trigger signal (Trig.) is received the process functions normally and only representation levels 
r are transmitted to the decoder. When a trigger (Trig.) is received (signifying a random- 
access frame) no representation levels r are transmitted but instead the quantisation table (Q) 
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or an index (TND) representing the quantisation table (Q) is transmitted, together with the 
initial phase (<|)(0)) and the initial frequency (co(0)). 

By proper setting the parameters of the quantiser, only a limited number of 



quantisation tables are possible. For the example given in Table 1, there are only 22 possible 
quantisation tables, which are listed below in Table 5 together with an index number. 



Index 


T, 


T 2 


T 3 


T 4 


0 


-4.2426 


-1.0607 


1.0607 


4.2426 


1 


-3.5676 


-0.8919 


0.8919 


3.5676 


2 


-3.0000 


-0.7500 


0.7500 


3.0000 


3 


-2.5227 


-0.6307 


0.6307 


2.5227 


4 


-2.1213 


-0.5303 


0.5303 


2.1213 


5 


-1.7838 


-0.4460 


0.4460 


1.7838 


6 


-1.5000 


-0.3750 


0.3750 


1.5000 


7 


-1.2613 


-0.3153 


0.3153 


1.2613 


8 


-1.0607 


-0.2652 


0.2652 


1.0607 


9 


-0.8919 


-0.2230 


0.2230 


0.8919 


10 


-0.7500 


-0.1875 


0.1875 


0.7500 


11 


-0.6307 


-0.1577 


0.1577 


0.6307 


12 


-0.5303 


-0.1326 


0.1326 


0.5303 


13 


-0.4460 


-0.1115 


0.1115 


0.4460 


14 


-0.3750 


-0.0938 


0.0938 


0.3750 


15 


-0.3153 


-0.0788 


0.0788 


0.3153 


16 


-0.2652 


-0.0663 


0.0663 


0.2652 


17 


-0.2230 


-0.0557 


0.0557 


0.2230 


18 


-0.1875 


-0.0469 


0.0469 


0.1875 


19 


-0.1577 


-0.0394 


0.0394 


0.1577 


20 


-0.1326 


-0.0331 


0.0331 


0.1326 


21 


-0.1115 


-0.0279 


0.0279 


0.1115 



Table 5: Quantisation tables at random-access frames 

So, in a preferred embodiment, in order to reduce the amount of data 



transmitted only an index representing/identifying/indicating the given quantisation table (Q) 
is transmitted to the encoder where the index is used to retrieve the appropriate quantisation 



** PHNL03 1261EPP 



15 13.10.2003 
table used as the initial table, which is explained in greater detail in connection with Figure 
5b. 

Preferably, an index is generated by using the well-known Huffman coding. 
For table 5 such a Huffman coding based index may be as listed in table 6 below: 



Tndersr 

IXLUvA 




o 


100001 


1 

X 


1 1 101 


o 


11110 


D 


i ino 


A 
*+ 


1 ini 


*> 


inin 


(L 
sJ 


01 1 1 


7 


OOI 


Q 
O 


1 m i 




ni i n 


in 


1001 
1UU1 


1 1 
1 x 


nini 
uxux 


19 




13 


0001 


14 


11100 


15 


01001 


16 


111111 


17 | 


111110 


18 


100000 


19 


010001 


20 


010000 


21 


10001 



5 Table 6: Huffman Index (IND) for quantisation tables 

In a preferred embodiment, instead of sending a given quantisation table or 
quantisation state (e.g. 19: T x = -0.1577; T 2 = -0.0394; T 3 = 0.0394; T 4 = 0.1577) only the 
index (IND) (e.g. 010001) is transmitted thereby saving bit-rate. This index is then used at 
the decoder to retrieve the proper quantisation table (e.g. 19), which then is used according to 
10 the present invention. 
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In this way, random-access is enabled while avoiding long quantisation 
accuracy in the quantiser, since no re-starting of the quantiser is needed as the current 
accuracy of the quantisation table is stored and transmitted to the decoder (either directly, by 
transmitting the given quantisation table (Q), or indirectly, by transmitting an index (HMD) 
5 referencing/uniquely identifying/indicating the given quantisation table (Q). Further, the 
quantisation table is adapted faster and/or a lower bit rate is obtained. 

Random-access frames may e.g. be selected or identified by selecting every 
N'th ftame during a track, using audio analysis to select appropriate points, etc. For each 
random-access frame, the trigger signal is provided to the quantiser (QT) 50 (and (PF) 48 and 
10 (QC) 52) when a random-access frame is being processed. 

From the sinusoidal code Cs generated with the sinusoidal encoder, the 
sinusoidal signal component is reconstructed by a sinusoidal synthesizer (SS) 131 in the same 
manner as will be described for the sinusoidal synthesizer (SS) 32 of the decoder. This signal 
is subtracted in subtracter 17 from the input x2 to the sinusoidal encoder 13, resulting in a 
15 remaining signal x3. The residual signal x3 produced by the sinusoidal encoder 13 is passed 
to the noise analyser 14 of the preferred embodiment which produces a noise code Cn 
representative of this noise, as described in, for example, international patent application No. 
PCT/EPOO/04599. 

Finally, in a multiplexer 15, an audio stream AS is constituted which includes 
20 the codes C T , Cs and C N . The audio stream AS is furnished to e.g. a data bus, an antenna 
system, a storage medium etc. 

Figure 4 shows an audio player 3 suitable for decoding an audio stream AS', 
e.g. generated by an encoder 1 of Figure 1, obtained from a data bus, antenna system, storage 
medium etc. The audio stream AS' is de-multiplexed in a de-multiplexer 30 to obtain the 
25 codes Ct, Cs and Cn- These codes are furnished to a transient synthesizer (TS) 3 1, a 
sinusoidal synthesizer (SS) 32 and a noise synthesizer (NS) 33 respectively. From the 
transient code Ct, the transient signal components are calculated in the transient synthesizer 
(TS) 3 1 . In case the transient code indicates a shape function, the shape is calculated based on 
the received parameters. Further, the shape content is calculated based on the frequencies and 
30 amplitudes of the sinusoidal components. If the transient code Cr indicates a step, then no 
transient is calculated. The total transient signal y T is a sum of all transients. 

The sinusoidal code Cs including the information encoded by the analyser 130 
is used by the sinusoidal synthesizer 32 to generate signal ys. Referring now to Figures 5a 
and b, the sinusoidal synthesizer 32 comprises a phase decoder (PD) 56 compatible with the 
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phase encoder 46. Here, a de-quantiser (DQ) 60 in conjunction with a second-order 
prediction filter (PF) 64 produces (an estimate of) the unwrapped phase \|/ from: the 

representation levels r; initial information <j>(0), (5(0) provided to the prediction filter (PF) 
64 and the initial quantisation step for the quantisation controller (QC) 62. If the frame is a 
random-access frame then the quantisation table (Q), received from the encoder instead of the 
representation levels r, is used in the de-quantiser (DQ) 60 as the initial table, as is explained 
in greater detail later. 

As illustrated in Figure 2b, the frequency can be recovered from the 
unwrapped phase \|7 by differentiation. Assuming that the phase error at the decoder is 
approximately white, and since differentiation amplifies the high frequencies, the 
differentiation can be combined with a low-pass filter to reduce the noise and, thus, to obtain 
an accurate estimate of the frequency at the decoder. 

In the preferred embodiment, a filtering unit (FR) 58 approximates the 
differentiation, which is necessary to obtain the frequency co from the unwrapped phase by 
procedures as forward, backward or central differences. This enables the decoder to produce 
as output the phases \|? and frequencies d> usable in a conventional manner to synthesize the 
sinusoidal component of the encoded signal. 

At the same time, as the sinusoidal components of the signal are being 
synthesized, the noise code C N is fed to a noise synthesizer NS 33, which is mainly a filter, 
having a frequency response approximating the spectrum of the noise. The NS 33 generates 
reconstructed noise yn by filtering a white noise signal with the noise code Cn- The total 
signal y(t) comprises the sum of the transient signal yt and the product of any amplitude 
decompression (g) and the sum of the sinusoidal signal y s and the noise signal y N - The audio 
player comprises two adders 36 and 37 to sum respective signals. The total signal is furnished 
to an output unit 35, which is e.g. a speaker. 

According to the present invention, for random-access frames the transmitted 
quantisation table (Q) or an index (IND) is received from the encoder instead of the 
representation levels r. The indication that the received frame is a random-access frame may 
e.g. be implemented by adding an additional field in the bit stream syntax comprising the 
appropriate index e.g. as shown in Table 6 thereby identifying the specific quantisation table 
(Q) to be used. From the Huffman code, the index is obtained. This index indicates the table 
that is used for the ADPCM, as shown in Table 5. This table includes all possible 
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quantisation tables Q. The number depends on the up and down-scale factors and the 
minimum and maximum values of the inner level. 

If the current flame is a random-access flame, meaning that sub-flame K 
includes for each sinusoid in the sub-frame, the additional field of the bit stream syntax 
5 having a value of a Huffman code (supplied to (QC) 62, (DQ) 60 and (PF) 64 as the trigger 
signal (Trig.) ). Furthermore, sub-flame K also includes the directly quantised amplitude, 
frequency and phase for each sinusoid as specified by the encoder. The field of the bit stream 
syntax is Huffman decoded and the appropriate table T is selected according to Table 5. This 
table is then used for the de-quantiser (DQ) (60) in the next sub-frame (K+l). The prediction 
1 0 filter (PF) 64 is re-initialised for sub-frame K+l in the same way as is done for the first 
continuation: 

Vr (iT-l)=<t)(iO-co(^) U, 
where U is the update interval. Here <j> is the phase and a> is the frequency transmitted in the 
sub-frame K. The decoding continues in the traditional fashion as described above. 

15 Figure 6 shows an audio system according to the invention comprising an 

audio encoder 1 as shown in Figure 1 and an audio player 3 as shown in Figure 4. Such a 
system offers playing and recording features. The audio stream AS is furnished from the 
audio encoder to the audio player over a communication channel 2, which may be a wireless 
connection, a data bus 20 or a storage medium. In case the communication channel 2 is a 

20 storage medium, the storage medium may be fixed in the system or may also be a removable 
disc, a memory card or chip or other solid-state memory. The communication channel 2 may 
be part of the audio system, but will however often be outside the audio system. 

Figures 7a and 7b illustrate the information sent from the encoder and received 
at the decoder according to prior art and to the present invention, respectively. Shown in 

25 Figure 7a are a number of frames (701; 703) shown at their frame number and frequency. 
Further is shown, what information or parameters that are transmitted from an encoder to a 
decoder for each (sub-) frame according to prior art. As can be seen, the initial phase (<|> (0)) 
and initial frequency (go (0)) are transmitted for the birth or start of track frame (701), while a 
representation level r is transmitted for each other frame (703) belonging to the track. 

30 Figure 7b illustrates a number of frames (701, 702, 703) shown at their frame 

number and frequency according to the present invention, as well as what information or 
parameters that are transmitted from an encoder to a decoder for each (sub-)frame. As can be 
seen the initial phase (<j> (0)) and initial frequency (G) (0)) are transmitted for the birth or start 
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of track firame (701), as in Figure 7a, while a representation level r is transmitted for each 
other frame (703) belonging to the track except for a random-access frame (702). For the 
random-access frame (702) the initial phase (<|> (0)) and initial frequency (© (0)) are 
transmitted from the encoder to the decoder together with the relevant quantisation table (Q) 
(or an index, as explained before). In this way, at least some of the quantisation state is 
transmitted from the encoder to the decoder thereby avoiding audible artefacts, as explained 
before while not enlarging the required bit-rate too much. 
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CLAIMS: 



1 ■ A method of encoding an audio signal, the method comprising the steps of: 

providing a respective set of sampled signal values (x(t)) for each of a plurality of sequential 
time segments; 

analysing the sampled signal values (x(t)) to determine one or more sinusoidal components 
5 for each of the plurality of sequential segments; 

linking sinusoidal components across a plurality of sequential segments to provide sinusoidal 
tracks, each track comprising a number of frames; and 

generating an encoded signal (AS) including sinusoidal codes (Cs) comprising a 
representation level (r) for zero or more frames and where some of these codes (Cs) comprise 
10 a phase (<|> ), a frequency (co ) and a quantisation table (Q) for a given frame when the given 
frame is designated as a random-access frame. 

2. A method according to claim 2, wherein a selection between a code for a 
frame comprising a representation level (r) and a code for a frame comprising a phase (<|> ), a 

1 5 frequency ( co ) and a quantisation table (Q) is made in dependency of a trigger signal (Trig.). 

3. A method according to claims 1-2, wherein each quantisation table (Q) is 
represented by an index (IND) and where the index (IND) is transmitted from the encoder (1) 
to the decoder (3) at a random-access frame (702) instead of transmitting the quantisation 

20 table (Q). 

4. A method according to claim 3, wherein the index (IND) is generated or 
represented using Huffman coding. 

25 5. A method according to claims 1-4, wherein the phase (<j> ) and the frequency 

(0) ) for a random-access frame is the initial phase (<j> (0)) and the initial frequency (co (0)). 



6. 

the steps of: 



A method of decoding an encoded audio stream (AS 1 ), the method comprising 
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receiving a signal including the encoded audio stream (AS 1 ), the audio stream (AS 1 ) 
comprising tracks of sinusoidal codes (Cs), where the sinusoidal codes (Cs) comprises a 
representation level (r) for zero or more frames and where some of these codes (Cs) comprise 
a phase (<}> ), a frequency ( a> ) and a quantisation table (Q) for a given frame when the given 
5 frame is designated as a random-access frame. 

7. A method according to claim 6, wherein each quantisation table (Q) is 
represented by an index (IND) and where the index (IND) is received from an encoder (1) 
instead of reception of the quantisation table (Q) at a random-access frame (702). 

10 

8. A method according to claim 7, wherein the index (IND) is generated or 
represented using Huffman coding. 

9. A method according to claims 6-8, wherein the phase (<J) ) and the frequency 
15 (Q) ) for a random-access frame is the initial phase (<|> (0)) and the initial frequency (go (0)). 

10. An audio encoder arranged to process a respective set of sampled signal values 
for each of a plurality of sequential time segments, the encoder comprising; 

an analyser for analysing the sampled signal values to determine one or more sinusoidal 
20 components for each of the plurality of sequential segments; 

a linker (13) for linking sinusoidal components across a plurality of sequential segments to 
provide sinusoidal tracks, each track comprising a number of frames; 

means (15) for providing an encoded signal (AS) including sinusoidal codes (C s ) comprising 
a representation level (r) for zero or more frames and where some of these codes (Cs) 
25 comprise a phase ( $ ), a frequency ( 0) ) and a quantisation table (Q) for a given frame when 
the given frame is designated as a random-access frame. 

1 1 . Audio player comprising: 

means for receiving a signal including the encoded audio stream (AS 1 ), the audio stream 
30 (AS 1 ) comprising tracks of sinusoidal codes (Cs), where the sinusoidal codes (Cs) comprises 
a representation level (r) for zero or more frames and where some of these codes (Cs) 
comprise a phase ((j> ), a frequency (go ) and a quantisation table (Q) for a given frame when 
the given frame is designated as a random-access frame, and 
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a synthesizer arranged to employ the zero or more received representation levels and the 
received phase (cj> ), frequency (a> ) and quantisation table (Q) for a given frame when the 
given frame is designated as a random-access frame in order to synthesize the sinusoidal 
components of the audio signal (y(t)). 

5 

12. Audio system comprising an audio encoder as claimed in claim 10 and an 
audio player as claimed in claim 11. 

13. Audio stream comprising sinusoidal codes (Cs) representing tracks of 

1 0 sinusoidal components linked across a plurality of sequential time segments of an audio 

signal, where the sinusoidal codes (Cs) comprises a representation level (r) for zero or more 
frames and where some of these codes (Cs) comprise a phase (<j) ), a frequency ( G) ) and a 
quantisation table (Q) for a given frame when the given frame is designated as a random- 
access frame. 

15 

14. Storage medium on which an audio stream as claimed in claim 13 has been 
stored. 
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ABSTRACT: 



Coding of an audio signal (x) represented by a respective set of sampled signal 
values (x(t)) for each of a plurality of sequential time segments is disclosed. The sampled 
signal values are analysed to determine one or more sinusoidal components for each of the 
plurality of sequential segments. The sinusoidal components are linked across a plurality of 
5 sequential segments to provide sinusoidal tracks, where each track comprising a number of 
frames. An encoded signal (AS) is generated including sinusoidal codes (C s ) comprising a 
representation level (r) for each frame or including sinusoidal codes (Cs) where some of these 
codes comprise a phase (<)> ), a frequency ( © ) and a quantisation table (Q) for a given frame 
when the given frame is designated as a random-access frame. 
10 The invention enables random access in a track while avoiding long adaptation of the 
quantisation accuracy in a quantiser and/or the need for a large bit-stream while still 
maintaining improved audio quality- 
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